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Abstract 

We consider a problem of extrapolating the collision properties of a large polyatomic molecule 
A-H to make predictions of the dynamical properties for another molecule related to A-H by 
the substitution of the H atom with a small molecular group X, without explicitly computing 
the potential energy surface for A-X. We assume that the effect of the —H —)■ —X substitution 
is embodied in a multidimensional function with unknown parameters characterizing the change 
of the potential energy surface. We propose to apply the Gaussian Process model to determine 
the dependence of the dynamical observables on the unknown parameters. This can be used 
to produce an interval of the observable values that corresponds to physical variations of the 
potential parameters. We show that the Gaussian Process model combined with classical trajectory 
calculations can be used to obtain the dependence of the cross sections for collisions of GgHsCN with 
He on the unknown parameters describing the interaction of the He atom with the CN fragment 
of the molecule. The unknown parameters are then varied within physically reasonable ranges to 
produce a prediction uncertainty of the cross sections. The results are normalized to the cross 
sections for He - CeHg collisions obtained from quantum scattering calculations in order to provide 
a prediction interval of the thermally averaged cross sections for collisions of GgHsGN with He. 
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INTRODUCTION 


This work is motivated by recent measurements of the thermalization dynamics of ben- 


zonitrile CeHsCN in the cold gas of He |l|. The interpretation of the experiments requires 
the knowledge of the thermally averaged cross sections for collisions of CeHsCN with He. 
However, the potential energy surface for He - CgHsCN interactions is, at present unknown 
and the accurate quantum dynamical calculation of the cross sections for He - CeHsCN col¬ 
lisions is a formidable task. On the other hand, the interaction of a closely-related molecule 
CeHg with He is well characterized and accurate quantum dynamical calculations of the 


, 3 - 


This raises the 


cross sections for CeHg - He collisions have been previously reported 
question: given the cross sections for CeHe - He scattering is it possible to make predictions 
of the cross sections for CgHsCN - He collisions? 


The experimentally measurable cross sections for elastic or inelastic collisions of molecules 
in the gas phase can be computed by means of quantum scattering or classical trajectory 
calculations 3- such calculation uses as input the interaction potential energy surface 
(PES) that governs the molecular dynamics. In general, the PES must be computed for 
every collision system before the dynamical calculations. The collision dynamics is typically 
sensitive to details of the PES and the collision properties of different molecules must be 
obtained from independent scattering calculations with the corresponding PESs. However, 
for large polyatomic molecules, the substitution of one atom or molecular group with another 
molecular group may alter only a small part of the global PES. In this case, instead of 
calculating the PES and the dynamical properties for the molecule with the substituted 
molecular group, one may consider to determine the effects of the substitution on the collision 
properties. This would allow one to make predictions about the collision properties of a 
specific molecule based on the known collision observables for another molecule. 

Extrapolating the collision properties between different molecular systems is important 
for multiple applications. First of all, it can be used to signihcantly increase the range of 
molecules amenable to rigorous scattering theory analysis. Consider, for example, the colli¬ 
sion properties of benzene CgHe and benzene substitutes CeHs-X, where X is a halogen atom 
or a molecular group. The Deh symmetry of benzene reduces the numerical complexity of the 
quantum scattering calculations to a great extent 3- The absence of this symmetry in CeHs- 
X makes the scattering calculations much more computationally difficult, often impossible. 
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It is, therefore, desirable to develop an approach for predicting the scattering observables 
for benzene substitutes, given the computed scattering observables for CeHe- Second, it is 
often necessary to compare the collision properties of different molecules for calibrating the 
experimental observations and/or the design of new experiments 5|-l7|. However, the direct 
measurements for some molecular species may often be difhcult or impossible. 

The extrapolation of the collision observables from one polyatomic molecule to another 
is a difficult task. Consider, for example, the interaction of CeHg and CeHs-X with a 
He atom. The substitution of H by X distorts the highly symmetric PES for the He - 
molecule interaction. If X is a halogen atom, it may be possible to evaluate the effect of the 
substitution by computing the scattering cross sections for collisions of CeHsX with He on 
a distorted benzene - He PES as functions of the distortion within a physically reasonable 
range of distortions. However, the problem becomes much more complex if —X is a molecular 
group, such as —CH 3 or —CN. The introduction of the molecular group —X changes the 
number of the internal degrees of freedom and the distortion of the PES becomes a function 
of many parameters, determined by the relative positions of the atoms in the molecular group 
—X. For example, if —X is —CN, the distortion of the PES is a function of 6 parameters. It is 
impossible to analyze the effect of the —H —)■ — CN substitution on the scattering observables 
by direct scattering calculations on a grid of these six parameters. 


Q, 


To overcome this problem, we propose to apply the Gaussian Process (GP' 


used for machine learning applications in engineering technologies 
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model 




13| . Given a 


set of data depending simultaneously on multiple parameters, the GP model determines the 
correlations between the data points in the entire data sample and provides a highly efficient, 
non-parametric interpolation between the data points in the multi-dimensional space. The 
Gaussian Process model is known to yield good prediction accuracy, when trained by 10 to 


100 data points per dimension [ 1 ^. An accurate six-dimensional hypersurface of scattering 
observables can thus be obtained with only 60 to 600 dynamical calculations at randomly 
chosen values of the 6 PES parameters. Here, we show that this can be exploited to provide 
a prediction interval of the scattering observables for a molecule A-X, given the scattering 
observable of the molecule A-H, where A and X are some molecular groups. To provide 


predictions useful for the experiments in Ref. [l| , we use the GP model for the extrapolation 
of the scattering cross sections for benzene - He collisions to determine the prediction range 
of the scattering cross sections for benzonitrile - He collisions. 
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GAUSSIAN PROCESS MODEL OF SCATTERING CROSS SECTIONS 


This work extends our previous contribution [l5j, where we proposed to apply the GP 
model for the analysis of the sensitivity of the scattering dynamics of complex molecules to 
details of the PES. In the present work we assume that the scattering cross sections of a 
molecule A-H are known, either from an experiment or from a quantum dynamics calculation. 
The goal is to formulate the methodology to explore the effects of the substitution —H —)■ 
—X on the scattering cross sections without explicitly computing the PES for A-X. We begin 
with a general formulation and specialize the problem to the collision systems of benzene 
and benzonitrile with He in the next section. 

We assume that X is a molecular group consisting of N atoms. For simplicity, we consider 
the interaction of the molecules A-H and A-X with a structureless atom Rg and assume that 
the interaction of A-H and A-X with Rg is described by a single adiabatic PES. This implies 
that both of the molecules and the Rg atom have closed electronic shells. The substitution 
of a hydrogen atom with an A-atom molecular group adds 3N — 3 degrees of freedom. 
The change of the PES due to this substitution depends on the coordinates specifying the 
position of each atom in the group —X relative to Rg. The PES for the interaction of the 
molecule A-X with Rg can be most generally written as 


Va-x = Va-h + I4dd(''’, R ), (1) 

where Va-h is the PES for the interaction of the molecule A-H with Rg, the vector R = 
Ra-x — R^g specifies the separation between the position Ra-x of the center of mass of 
the molecule A-X and the position R^g of the Rg atom, and r = (r’i,r 2 ,-- - is a 

vector with J\f = 3N components that represent the positions of the individual atoms in the 
molecular group —X relative to Ra-x- The allowed range of the coordinates r* is restricted 
by the parameters of the individual bonds in the molecular group —X. 

A rigorous calculation of the scattering cross sections for collisions of A-X with Rg re¬ 
quires the evaluation of the PES ([I]) as a function of r and R, followed by the scattering 
calculations. This is generally a prohibitively difficult task that we want to avoid. Instead, 
we propose to determine the range of collision cross sections given physically reasonable 
variations of the PES as a function of the coordinates r*. The specihc implementation of 
this procedure is illustrated in the next section. 
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The coordinates can be written as = r° + 5ri, where r° are the coordinates of 
the atoms in the equilibrium geometry of the molecular group —X, and 5ri describe the 
deviations from the equilibrium geometry. The equilibrium geometry is generally unknown 
so the equilibrium coordinates r° must be treated as unknown parameters. The value of 
the PES change Kdd(’", -R) at each point of r and R can be restricted on physical grounds 
to be within Al/(r, ii). For the methodology proposed here, it is necessary to represent 
Vadd{r,R) by a series of analytical functions characterized by a finite set of parameters 
CK = («!, ...,ak)~^, such that the variation of the individual parameters Oj within a certain 
range changes 14dd(^; R) within AV{r, R). The change of the PES in Eq. ([T]) thus becomes 
a function of these parameters: 14dd(^’, R) ^ 14dd(’", R|q:). Since r° are assumed to be 
unknown, 3iV of the parameters in the vector cx must correspond to r°. 

With this formulation, the problem is reduced to determining the variation of the scatter¬ 
ing cross sections as functions of ai,..., a*,. If the computation of the PES is to be avoided, 
one could - in principle - compute the scattering cross sections as functions of the individual 
parameters Oj, thus producing a prediction interval of the scattering observables. Assuming 
that 14dd(’’, R|q:) can be parametrized by ten parameters at in addition to 3A values of 
the equilibrium positions r°, the total number of parameters is 3X -|- 10. This ranges from 
16 for a two-atom molecular group —X to 22 for a four-atom molecular group. It is clearly 
unfeasible to perform dynamical scattering calculations, whether quantum or classical, on 
a grid of these 22 parameters. We propose to use the GP model to determine the depen¬ 
dence of the cross sections on ck, in order to find the prediction interval of the cross sections 
corresponding to the range of the PES variations AE(r, i?|Q:). 

We consider the scattering cross section hi as a function of q parameters described by 
vector X. The components of the vector x = {xi,X2, ■ ■ ■ yXg^ are the collision energy, the 
internal energy of the molecule A-X and the parameters a. The cross sections 12 can be 
calculated by means of a classical trajectory method [1^ at hxed values of (ai,...., a^), 
fixed values of the collision energy and well-defined internal energies of the molecule. Given 
the calculated values of 12 at a small number of randomly chosen values of (oi, ....,ak), we 
determine the correlations between different values of 12. These correlations are then used 
to train a GP model in order to make predictions of the cross sections for arbitrary values 
of («!,...., Qffc) within a given interval. The number of computed cross sections necessary to 
make accurate predictions can be estimated to be 10 x the number of parameters |l^. With 
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22 parameters a and the collision and internal energies as the additional free parameters, 
the number of values required to determine an accurate dependence of on ck should 
be about 240. Whether this number of calculations yields sufficient accuracy can be easily 
tested by computing the cross sections at new, arbitrary values of a and comparing the 
computed values with the GP model predictions. 

A GP can be viewed as a family of random functions normally distributed around a mean 
function. We denote the GP by F{-). The realization F{x) of a GP at a particular site x 
of the multi-parameter space is a value of a function randomly drawn from this family of 
functions and evaluated at x. In the following, we will refer to F{-) as a GP or as a Gaussian 
random function. A GP is completely dehned by its mean function /i(-) and a covariance 
function K{-, •). We assume a GP with a constant variance so that K{-, ■) = cr^Ri-, •), 
where R{-, ■) is a correlation function. The random outputs F{x) at hxed x thus form a 
normal distribution with mean fi{x) and variance and the multiple outputs at different 
X jointly follow a multivariate normal distribution HQ. 

The scattering cross section at each value of x is assumed to be a realization of a Gaussian 
process F{x). Given a set of cross sections computed at a hnite number of values x, the 
goal is to hnd /i(-) and i?(-, •). Since the correlation function is htted to calculated data, the 
choice of R{-, ■) is somewhat flexible. We use the following analytical form for the correlation 
function, known as the Matern correlation function 19l-l22|: 

R(x,d) ( 2 ) 

where di = \/^|a;j—x'l/cuj, is the modihed Bessel function of order u and cuj are the un¬ 
known parameters representing the characteristic length scales of the correlation variations. 
We £x the value of n to z/ = 5/2, which reduces the correlation function to 

q 


R{x, a/) = < JJ ( 1 + 


. 2=1 


\/5|xi 


X, 


+ 


5{xi 


X 


/\2 


Ui 




exp — 


\/5|xi 


Ui 



(3) 


We hnd that the Matern correlation function yields more accura' 


19 


e results for the present 


22 | we used in the previous 


application than the popular Gaussian correlation function 

work 15|. With the Gaussian correlation function, the GP is differentiable to any order. 


With the Matern correlation function with parameter i/, the process is differentiable to order 
k {k < v). Thus, with v = 5/2, the GP is twice differentiable. 


6 
















The mean of the Gaussian random function F{-) can be modelled as 


S 


= h{x)^(3 


( 4 ) 


where h = {hi{x),hs{x))~^ is a vector of s regression functions 


j^, [w] and (3 


(/3 i,/92 ,-- - , Ps)~^ is a vector of unknown coefficients. If the dependence of the cross sec¬ 
tions on any of the parameters in x is known, the regression functions hi{x) can be chosen 
to mimic this dependence. This can make the GP model more efficient, i.e. fewer cross 
section values may be required to achieve the desired level of accuracy at arbitrary values 
of parameters. However, we note that the GP model does not rely on any specific form 
of the regression functions in Eq. (j4]). In the present work, we assume that hi = 1 and 
hi>i = 0, which reduces Eq. (3) to a single unknown parameter (3. The problem is thus 
reduced to Ending the parameters /3, and uj = {ui,U 2 , ■ ■ ■ that provide the most 

accurate correlation function. 

The GP model analysis begins with the computation of the cross sections at n input 
vectors cci,...,a3„ randomly chosen to cover the allowed interval of the parameters. These 
vectors are referred to as the training sites. The multiple outputs of a GP at the training 
sites = ( F{xi)^ F{x 2 ), • • ■ , F{xn )) follow a multivariate normal distribution 


~ MVN(H/3, a^A) 


(5) 


with the mean vector H/3 and the covariance matrix A. Here, H is an n x s design matrix 
with s regressors for each training site Xn as the matrix elements 


H 



( 6 ) 


y hii^Xn) ■ ■ ■ hsiXn) j 


and A is a n X n matrix defined as 


A 



(7) 


y R{x^^ Xi^ 
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If the coefficients lj = { 001 , 002 , ■ ■ ■ ^ooq)' are known, the maximum likelihood estimators 
(MLE) of j3 and are given in terms of H and A js, 


^{00) = ( 8 ) 

a^ico) =-(Y’^-H/3), (9) 

n 

where the hat over the symbol denotes the MLE. To find the MLE of a;, we maximize the 
log-likelihood function 

log£(a;|Y'^) = —^ [nlogd^ -|- log(det(A)) -|- n\ (10) 


numerically by an iterative computation of the determinant | A| and the matrix inverse A~^. 

The goal is to make a prediction of the cross section at an arbitrary input vector x = Xq, 
given the values of the cross sections at the training sites. The values Yq = F{xo) obtained 
by multiple realizations of the GP at Xq and the multiple outputs of the GP at training sites 
Y"- = fF{xi), F{x 2 ), ■ ■ ■ , F{Xn)^ are jointly distributed as 


Yo 


~ MVN 


h(a3o)'^ 

H 


f3,a^ 


1 

Ao 



( 11 ) 


where Aq = {R{xo, Xi), R{xq, X 2 ), ■ ■ ■ , R{xq, a^n))^ is a column vector specified by the cor¬ 
relation function i?(-|d>) with the MLE of uo. This means that the conditional distribution 
of possible values Iq = -^(* 0 ) given the values Y'^ is a normal distribution 


Eoll"", /3, a; ~ N(m(a;o)*, af (a^o)) (12) 

with the conditional mean and variance given by 

m{x,Y = h{xoV(3 + A'^A-YY^ - H/3) (13) 

a*‘^{xo) = cr^(l - A([A~^Ao). (14) 


Note that the conditional mean given by Eq. (IT^ is shifted from the unconditional mean 
equal, in our case, to (3 due to the point-to-point correlations R{-, •). Replacing Y"- in Eq. 
flldp with the vector of the known cross section values ft = ^r2(a;i), • • • , r2(a:„)^ , we obtain 
the GP model prediction for the value of the cross section at Xq 


n(xo) = h(xo)^/3 + A^A-Y^2 - H/3). 


(15) 




FROM BENZENE TO BENZONITRILE 


In this section we apply the GP model to predict the cross sections for elastic scattering 
of benzonitrile (CeHgCN) with He, required for the interpretation of the experiments on 
cooling polyatomic molecules in a buffer gas of He [l|. The PES for CeHsCN - He is currently 
unknown. The quantum scattering calculations of cross sections for collisions of CgHsCN 
with He are prohibitively difficult. At the same time, the interaction of benzene (CfiHfj) with 


He can be accurately described using a semi-empirical bond-additive method |23 


24 1 and 


the cross sections for collisions of CgHg with He can be computed using an accurate coupled 


states approach 


25 


26| based on the time-independent quantum scattering theory 


In 


the present section, we show how the GP model can be used to obtain the range of the cross 


sections for He - GgHsGN collisions, given the cross sections for He - Gfih 


We use the semi-empirical approach of Pirani and coworkers 


23 


6 collisions. 


24j | for constructing 


the PES for the interactions of the polyatomic molecules with He. This method treats a 
polyatomic hydrocarbon molecule as an ensemble of G-G and G-H bonds and represents the 
PES for the molecule - He interaction as a sum of pairwise He - GH bond and He - GG bond 
interaction energies, optimized based on accurate ab initio calculations and measurements 
of bond polarizabilities. This method of representing the PES is particularly well suited for 
the analysis of the —H —)■ —X substitution on the molecular scattering properties using the 
GP model described in the previous section. 

Within the approach of Pirani and coworkers 2^, each He-GH and He-GG bond interac¬ 


tion is represented by the following analytical function: 

10+4a:2 


Vab{r,e) = e(e) 


2 + 2^2 


5 + 2x2 
2 + 2x2 


(16) 


where 


e{6) = e_i_ sin^(0) + ey cos^(6'), 

’mW = rmx sm^(S) + r„|| cos^(«), 


( 17 ) 

( 18 ) 


r is the distance between the atom and the center of the bond, and x is the reduced distance 
X = r/rm{0), where rm{0) is the position of the potential well and e{9) is the energy at 
the bottom of the potential well. The parameters ej_,e||,rm± and r^n represent the well 
depth and location for the parallel and perpendicular approaches of the He atom to the 
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corresponding bond and 6 is the angle between the bond axis and the vector connecting the 
He atom to the center of the bond axis. The eight parameters specifying the PES for the 
He - CgHg interaction are given in Table [H 


System 

1"m\\ 

ex ey 

Reference 

Aromatic CC-He 

3.583 4.005 0.860 0.881 

m 

Non-aromatic CC-He 

3.09 4.10 

1.03 0.66 

\27} 

CH-He 

3.234 3.480 

1.364 1.016 

m 


TABLE I. Parameters for the He-CH, He-CC aromatic and He-CC non-aromatic bond interactions. 
The distances are given in A and the energies are in meV. 


In order to construct the PES for the He - CgHsCN interaction, we use the same approach. 
However, the PES for the He - CeHsCN system must include, in addition to the aromatic 
CC - He and CH - He interactions, the parameters describing the interaction of the He 
atom with the non-aromatic single CC bond and the — CN bond fragment. The parameters 
or the He - non-aromatic, single CC bond interaction are known from the work in Refs. 


24. l27l | and are listed in Table I. The parameters for the He - CN bond interaction are, at 


present, unknown and we propose to treat them as variable parameters. The PES for He 
- CeHsCN collisions is thus parametrized by twelve parameters listed in Table H] and four 


unknown parameters r^j_, r* 


I; '^m± 


and el|| characterizing the locations and energies of the 


well depths arising from the perpendicular and parallel approaches of the He atom to the 
—CN bond fragment in the molecule. We treat these parameters as variables in the input 
vector X of the GP model. The vector x is thus a hve-dimensional (5D) vector, containing 
four parameters r^j_, and eAy and the collision energy. In this work, we £x the 

internal energy of the molecule to correspond to the ground rotational state. 

Given that the size of the N atom is smaller than the size of the C atom and that the C-N 
bond is more polarized than the C-C bond, one should expect that the He - CN interaction 
parameters should lead to a larger potential depth but smaller equilibrium distance than 
the parameters of the He - CC bond interactions. This puts the upper limit on the values 

In order to explore the 
dependence of the collision cross sections on the variation of these parameters, we construct 
100 different PESs for the He - CgHsCN system with the variable parameters in the following 


of rA_i_ and and the lower limit on the values of eAj_ and 
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ranges: ~ Unif(2.5,3.6) A, ~ Unif(3.0,4.0) A, e*j_ ~ Unif(0.86, 2.5) meV, e| ~ 

Unif(0.88, 2.5) meV. 

The substitution of the H atom in benzene with the molecular group CN is expected to 
introduce a local change of the global PES. To illustrate this, we plot in Figure 1 the cross 
sections of the PES for the He - CgHsCN and He - benzene interactions for various geometries 
of the atom - molecule approach. Figure 1 illustrates that the CN molecular group leads to 
a significant change of the potential energy only when the He atom approaches the CN side 
of the molecule. This justifies the representation ([T]) of the global PES and the approach 
adopted here. 


RESULTS 


Since the quantum scattering calculations of cross sections for He - CgHsCN to 
prohibitively difficult, we use the classical trajectory method developed in Refs 


lisions are 


28|, 


29j for 


the dynamical calculations for benzonitrile. Using the classical trajectory computations, as 


described in detail in Ref. 


, we compute the scattering cross sections for 100 combinations 


of different PESs and collision energies. Each computation represents an average of 5000 
trajectories. The cross sections are computed using the corrected version of Eq. (23) in 
Ref. 2, These cross sections are then used to train the GP model in order to provide 
the global dependence of the cross sections on the collision energy and the four unknown 
interaction potential parameters. The resulting ranges of the cross sections are then scaled 
by the ratio hlgm/Hct, where Hqm is the thermally averaged cross section for He - benzene 
collisions computed with the coupled states method |25|, l26[ and flct is the same cross section 
computed with the classical trajectory method used for benzonitrile. 

Our first goal is to obtain the range of the cross sections corresponding to a physical 
variation of the He - CN bond interaction parameters e^, e| and the atom - 

molecule collision energy. Figure 2 shows the variation of the cross sections as a function 
of one of the PES parameters, with the remaining PES parameters and the collision energy 
chosen for each point at random. As evident from Figure 2, the scattered points cannot be 
assumed to follow any well-defined dependence. 

We obtain the global dependence of the cross sections on the four PES parameters and 
the collision energy by computing the cross sections at 100 points, placed randomly and 
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quasi-uniformly, in this 5D parameter space. These points are used to train the GP model 
as discussed in the previous section. Given the GP model, we compute the global 5D surface 
illustrated in Figure 3. In order to demonstrate the accuracy of the global 5D surface, we 
computed the cross sections at a different set of 100 points randomly chosen in the 5D 
parameter space. Figure 4 compares the predicted values with the computed values for 
these 100 points. 

In order to quantify the error of the GP model, we compute the empirical root mean 
squared error (ERMSE) defined as 




\ 


n 


- Vif 


i=l 


(19) 


and the scaled root mean squared error (SRMSE) defined by Ss = Sk /— |/min)- In Eq. 
(na. n is the number of the cross section values, Ui represents the values of the computed 
cross sections and yi - the value predicted by the GP model. For the model with only 50 
scattering calculations used as training points, Ee = 2.17 and Es = 4.4 %. If the number 
of the scattering calculations is increased to 100, the errors decrease to Ee = 1-77 and 
= 3.6 %. 


Given the GP model of the collision cross sections trained by the computations at random 
values of e^, ej| and the atom - molecule collision energy, we can use Eq. (IT^ 

to examine the variation of the collision energy dependence of the cross sections as the 
four parameters are varied within the physical ranges specified in the previous section. This 
produces a band of the cross section values for each value of the collision energy presented in 
Figure 5. The results illustrate that the physical variation of the PES parameters describing 
the interaction of the He atom with the GN group changes the cross sections by less than 
33 %, with the percentage defined as the difference between the maximum and minimum 
values divided by the mean value. The computation of the cross sections for benzonitrile 
- He collisions with an accurate PES must fall within the grey area of Figure 5 with the 
probability 95 %. 

Since the GP model provides the global dependence of the cross sections on the underlying 
parameters, it is possible to perform the analysis of the sensitivity of the cross section 
variations to the individual PES parameters by using the functional analysis of variance 


decomposition 


30H3^. The results shown in Figure 6 illustrate that, of the four unknown 
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parameters, e| has the biggest effect on the variation of the cross sections. Figure 6 also 
shows that the effect of the individual parameters is largely correlated with the values of the 
other parameters. This means that the prediction interval in Figure 5 must be obtained by 
the simultaneous variation of each of the underlying parameters and cannot be obtained by 
a simple variation of one of the parameters with the other parameters kept hxed at a few 
random values. 

The final results of this work are presented in Figure 7, showing the thermally aver¬ 
aged cross sections for benzene - He collisions computed with the quantum CS approach 
(blue squares) and the interval of the thermally averaged cross sections for benzonitrile - 
He collisions obtained by integrating the results in Figure 5 with the Maxwell-Boltzmann 
distribution of collision energies and scaling the thermally averaged cross sections by the 
ratio riqm/fict- Figure 7 shows that the presence of the molecular group CN enhances the 
elastic scattering cross sections by a factor of 1.1 - 1.5. The uncertainty of the parameters 
in the He - CN group interaction leads to the uncertainty ~ 27% in the final thermally 
averaged cross section. This percentage is dehned as the difference between the maximum 
and minimum values of the grey band in Figure 7 divided by the corresponding mean value. 


SUMMARY 

In the present work we consider a general problem of extrapolating the known collision 
properties of a complex polyatomic molecule A-H to make predictions for another molecule 
related to A-H by the substitution of the H atom with a small molecular group. Using 
the example of CgHg - He and CgHsCN - He collision systems, we show that the —H —)■ 
—CN substitution leads to a local modification of the global potential energy surface. While 
the quantitative effect of the —H —)■ —CN substitution on the PES is unknown, it can 
be parametrized by a hnite number {k) of parameters (ai,...., a^), collectively denoted by 
CK. With this parametrization, the effect of the —H —)■ —CN substitution on the collision 
observable U is embodied in a multidimensional function 11(0:). Even if the parameters 
o are unknown, once the function of U on o is determined, it can be used to obtain the 
prediction interval for U given physically reasonable variation of o. 

We propose to apply the Gaussian Process model to determine r2(o). The model requires 
about 10 X {k + 1) calculations of the scattering observables to provide accurate results. 
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Thus, with A; = 4, an accurate hve-dimensional dependence of hi on o: and on the collision 
energy can be obtained with about 50 dynamical calculations. We showed that this pro¬ 
cedure can be used to obtain an accurate dependence of the cross sections for collisions of 
CgHsCN with He on the parameters describing the interaction of the He atom with the CN 
fragment of the molecule. The results are then compared with the cross section for He - 
CeHg collisions known from the quantum scattering calculations in order to provide a pre¬ 
diction interval of the thermally averaged cross sections for collisions of CeHsCN with He. 
This allowed us to obtain the prediction interval (Figure 7) of the collision cross sections for 
CgHsCN - He collisions without the knowledge of the potential energy surface. 

This work illustrates that the Gaussian Process model can be used for a variety of appli¬ 
cations in molecular dynamics research. For example, once the function 17(0:) is obtained, 
it can be used to perform the sensitivity analysis (such as in Figure 6) to determine which 
of the PES parameters is more or less important in determining the collision observable 17. 
The dependence of 17 on the collision energy, which is treated as one of the unknown model 
parameters, can be used for an efficient integration of the collision properties to obtain ther¬ 
mally averaged observables. The function 17 (q:) can be used to integrate out the dependence 
of the collision observables on the PES parameters, thereby minimizing the uncertainties as¬ 
sociated with inaccuracies of the PES calculations and providing the error bars associated 
with the uncertainties of the PES calculations. 

The computational effort associated with training the Gaussian Process model is deter¬ 
mined by matrix inversion and scales as the third power of the number of training points. 
Given that the number of training points required for accurate predictions is typically 10 
times the number of unknown parameters and that it has now become routine to invert 
matrices with the dimension of 1000 x 1000, the methodology proposed here can be applied 
to determine the dependence of the scattering observables on up to 100 unknown parame¬ 
ters. It is thus easy to envision an application where the Gaussian Process model is used to 
characterize the dependence of a dynamical observable on all PES parameters describing the 
interaction of two polyatomic molecules. This dependence can be used to determine which 
of the atoms in the two molecules are important for the detectable outcome of the molecule 
- molecule interaction and which of the atoms can be parametrized by simple functions 
irrelevant for the outcome of the interaction. 
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FIG. 1. Radial dependence of the interaction energy of benzonitrile with He averaged over four 
variable parameters e^j_ and e^y in the out-of-plane configuration (black solid line), the 

H-vertex-in-plane conhguration (red solid line), the CN-vertex-in-plane configuration (solid green 
line) and the side-in-plane configuration (blue solid line) compared with the corresponding inter¬ 
action energy of benzene with He (black dashed line for the out-of-plane conhguration, red dashed 
line for the vertex-in-plane conhguration, and blue dashed line for the side-in-plane conhguration). 
The relative geometries of the four limiting conhgurations are depicted near each curve. 
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FIG. 2. The dependence of the elastic scattering cross section on each of the four PES parameters 
for benzonitrile-He collisions. The other three parameters and the collision energy are chosen at 
random for each point shown. 
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FIG. 3. The global 5D response surface produced by the GP model illustrated in three dimensions, 
with the collision energy and one of the PES parameter plotted on the x- and y-axises. The cross 
sections shown are averaged over the other three PES parameters. 
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FIG. 4. Accuracy of the GP model with variable PES parameters for the prediction of the elastic 
scattering cross sections. The scatter plot compares the predicted values with the computed values 
at 100 points. The error of the GP model is the deviation of the points from the diagonal line. 
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FIG. 5. Collision energy dependence of the elastic scattering cross section averaged over four PES 
parameters eJ^j_ and for benzonitrile-He collisions (red solid line) and the associated 

95% prediction interval (grey band). The line with blue circles shows the corresponding cross 
section for benzene-He collisions computed by the classical trajectory method. The results are for 
collisions of molecules in the ground rotational state. 
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FIG. 6. The relative effect of the variation of the four PES parameters ’’mil’ and e^ii on 
the elastic scattering cross sections. The blue area of the bars shows the uncorrelated contribution 
of the corresponding parameter and the green area - the joint effect that depends on the value of 
the other PES parameters and the collision energy. 
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FIG. 7. The comparison of the thermally averaged cross sections for benzene - He collisions 
computed with the quantum CS approach (blue squares) with the 95% prediction interval of the 
thermally averaged cross sections for benzonitrile - He collisions (grey band). The prediction 
interval for the benzonitrile - He collisions and the corresponding mean values (red solid line) are 
obtained by integrating the results in Figure 5 with the Maxwell-Botzmann distribution and then 
scaling the thermally averaged cross sections by the ratio Ilqm/IIct, as described in the text. The 
results are for collisions of molecules in the ground rotational state. 


22 



[1] D. Patterson and J. M. Doyle, Phys. Chem. Chem. Phys. 17, 5372 (2015). 

[2] Z. Li, R. V. Krems, and E. J. Heller, J. Chem. Phys. 141, 104317 (2014). 

[3] Note that Eq. (23) of Ref. Q] incorrectly includes the factor 4. The correct equation is (Ji^i = 
Trb'^^^Ni^i/Ntot- The computations reported in Ref. Q] used the correct equation. 

[4] J. R. Taylor, Scattering Theory: the Quantum Theory of Nonrelativistic Collisions (Dover 
Publications, 2006). 

[5] J. Kiipper, E. Filsinger, and G. Meijer, Faraday Discuss. 142, 155 (2009). 

[6] C. Sommer, L. van Buuren, M. Motsch, S. Pohle, J. Bayerl, P. Pinkse, and G. Rempe, Faraday 
Discuss. 142, 203 (2009). 

[7] D. Patterson, E. Tsikita, and J. M. Doyle, Phys. Chem. Chem. Phys. 12, 9736 (2010). 

[8] J. Sacks, S. B. Schiller, and W. J. Welch, Technometrics 31, 41 (1989). 

[9] T. J. Santner, B. J. Williams, and W. I. Notz, The Design and Analysis of Computer Experi¬ 
ments (Springer Science &: Bussiness Media, New York, 2003). 

[10] C. E. Rasmussen and C. K. I. Williams, Caussian Process for Machine Learning (The MIT 
Press, Gambridge, 2006). 

[11] D. Higdon, M. Kennedy, J. C. Cavendish, J. A. Cafeo, and R. D. Ryne, SIAM J. Sci. Comput. 
26, 448 (2004). 

[12] D. Higdon, J. Gattiker, B. Williams, and M. Rightley, J. Am. Statist. Assoc. 103, 570 (2008). 

[13] R. B. Gramacy and H. K. H. Lee, J. Am. Statist. Assoc. 103, 1119 (2008). 

[14] J. L. Loeppky, J. Sacks, and W. J. Welch, Technometrics 51, 366 (2009). 

[15] J. Cui and R. V. Krems, accepted for publication in Phys. Rev. Lett. (arXiv: 1503.01432v2). 

[16] R. B. Bernstein, Atom-Molecule Collision Theory: A Guide for the Experimentalist, (Plenum, 
New York, 1979). 

[17] R. J. Adler, The Geometry of Random Fields, (SIAM,1981). 

[18] H. Cramer and M. R. Leadbetter, Stationary and Related Stochastic Processes: Sample Func¬ 
tion Properties and Their Applications, (Courier Corporation, 2013). 

[19] T. Mitchell, M. Morris, and D. Ylvisaker, Stoch. Proc. Appl. 35, 109 (1990). 

[20] N. A. C. Cressie, Statistics for Spatial Data, (Wiley-Interscience, New York, 1993). 


23 



[21] M. L. Stein, Interpolation of Spatial Data: Some Theory for Kriging, (Springer Science & 
Business Media, 1999). 

[22] M. Abt, Scand. J. Stat. 26, 563 (1999). 

[23] F. Pirani, D. Cappelletti, G. Liuti, Chem. Phys. Lett. 350, 286 (2001). 

[24] F. Pirani, M. Albertt, A. Cairo, M. Moix Teixdidor, and D. Cappelletti, Chem. Phys. Lett. 
394, 37 (2004). 

[25] B. J. Garrison, and W. A. Lester, Jr., J. Chem. Phys. 66, 531 (1977). 

[26] S. Green, J. Chem. Phys. 64, 3463 (1976). 

[27] M. Bartolomei, private communication (October 2012). 

[28] Z. Li and E. J. Heller, J. Chem. Phys. 136, 054306 (2012). 

[29] J. Cui, Z. Li, and R. V. Krems, J. Chem. Phys. 141, 164315 (2014). 

[30] A. Saltelli, K. Chan, and E. M. Scott, Sensitivity Analysis (Wiley, New York, 2009). 

[31] A. Saltelli, M. Ratio, T. Andres, E. Campolongo, J. Cariboni, D. Gatelli, M. Saisana, and S. 
Tarantola, Global Sensitivity Analysis: the Primer (John Wiley & Sons, 2008). 

[32] O. Roustant, D. Ginsbourger, and Y. Deville, J. Stat. Softw. 51, 1 (2012). 


24 


